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USEFULNESS OF THE SMALL 
HEAT SHOCK GENES FOR 
PHYLOGENETIC ANALYSIS IN 
PLANTS: 


ABSTRACT 


Sequences from chloroplast DNA and rDNA genes have proved useful in estimating phylogenetic relationships 
among plants. However, the coding regions of rDNA evolve slowly and are not useful at lower levels of analysis. 
There are chloroplast-encoded genes that evolve at a higher rate, but since chloroplast DNA is uniparentally inherited, 
nuclear markers are also needed when introgression or hybridization have occurred. There is then a need for additional 
nuclear genes that can be used in phylogenetic analysis. Nuclear genes have been largely unused in phylogenetic 
analysis for several reasons, including the difficulty of distinguishing orthologous from paralogous genes and the 
problems that can ensue from constructing phylogenetic trees from data matrices that contain both types of genes. 
This paper is an evaluation of the potential of the genes encoding the small heat shock proteins for phylogenetic 
analysis of plants. The small heat shock genes in plants are a super gene family composed of four gene families. In 
this paper I present restriction site analysis of the small heat shock genes from the Brassicaceae, and similarity, rate 
of evolution, and phylogenetic analysis of small heat shock gene sequences from monocots and dicots. I show that 
these genes possess the necessary genetic variation for phylogenetic analysis and that the gene families are easily 
distinguishable from each other, and that, in at least some of the gene families, orthologous and paralogous genes do 


not pose a problem for phylogenetic analysis. 


In recent years the availability of molecular data 
has vastly increased our ability to test hypotheses 
of both phylogenetic relationships among plant 
groups and population-level evolutionary process- 
es. These studies, with few exceptions, have relied 
primarily on chloroplast DNA (cpDNA) and nuclear 
ribosomal DNA (rDNA) (Clegg & Zurawski, 1992; 
Hamby & Zimmer, 1992). The large numbers of 
cpDNA sequences (Chase et al., 1993) across many 
taxonomic groups have enabled scientists to test 
hypotheses concerning rates of evolution of genes 
used as markers in phylogenetic analysis, in ad- 
dition to addressing phylogenetic questions (Bous- 
quet et al., 1992; Gaut et al., 1992). However, 
the recognition of the distinction between gene 
trees and species trees and the effects that hy- 
bridization can have on phylogenies constructed 
from uniparentally inherited genes (Dorado et al., 
1992; Doyle, 1992; Pamilo & Nei, 1988; Rie- 
seberg et al., 1990; Rieseberg & Brunsfeld, 1992; 


Rieseberg & Soltis, 1991) has led to the awareness 
that additional molecular markers are needed. In 
addition, there is concern over the effects of mul- 
tiple changes at one site (Smith, 1989) and sec- 
ondary structure and compensatory mutations 
(Dixon & Hillis, 1993) on phylogenetic reconstruc- 
tion using rDNA sequences. 

Systematists have been wary of using nuclear 
genes for phylogenetic analysis because many nu- 
clear genes are members of multi-gene families 
(Doyle, 1993). The inability to distinguish orthol- 
ogous and paralogous genes within a multi-gene 
family could undermine attempts to construct or- 
ganismal relationships. Gene conversion and re- 
combination within gene families will obscure or- 
ganismal relationships, and this may not be 
immediately obvious from the cladograms (San- 
derson & Doyle, 1992). Thus, prior to its use in 
phylogenetic analysis the evolutionary dynamics of 
a gene should be examined. In this paper the small 
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heat shock protein genes are examined for their 
usefulness in phylogenetic reconstruction. 

Organisms respond to heat stress with the rapid 
induction of the heat shock genes and subsequent 
production of heat shock proteins (hsp) that are 
necessary for thermal tolerance. The heat shock 
genes are members of multi-gene families based on 
the size of the proteins the genes encode. The small 
heat shock proteins are less than 30 kDa (kilo- 
Dalton) in size. The large heat shock proteins range 
from 60 to 100 kDa. The 90, 70, and 60 kDa 
proteins are the most extensively studied of all of 
the heat shock proteins. The large heat shock pro- 
teins are chaperone proteins, i.e., they help in the 
proper folding and translocation of other proteins 
(Becker & Craig, 1994; Gething & Sambrook, 
1992). The function of the small heat shock pro- 
teins, however, remains elusive. In most organisms, 
other than plants, the majority of the heat shock 
proteins produced during stress are large. The small 
heat shock proteins are present, but are a small 
percentage of the total protein. In Drosophila there 
are just four small heat shock genes, and in Sac- 
charomyces there is only one (Lindquist & Craig, 
1988; Susek & Lindquist, 1989). In contrast, plants 
produce a variety of small heat shock proteins 
(Vierling, 1991). There are four small heat shock 
protein gene families in plants. The four classes 
are based on overall sequence similarity and cel- 
lular localization. There are two classes of proteins 
that localize to the cytosol, Class I and Class II. 
The third class of proteins is found in the endo- 
plasmic recticulum, and the fourth in the chloro- 
plast. The carboxyl-terminal domain is more con- 
served than the amino-terminal domain both within 
and across protein classes (Vierling, 1991). Vier- 
ling (1991) suggested that the four classes of small 
heat shock genes in plants may have evolved 
through gene duplications. 


MATERIALS AND METHODS 


CLONING AND MAPPING OF A SMALL HEAT 
SHOCK GENE IN BRASSICA NIGRA 


Brassica nigra (CrGC accession number 2-1) 
seeds were obtained from the Crucifer Genetic Co- 
operative (CrGC, Madison, Wisconsin). Seeds were 
germinated and planted in Terra-Lite (Grace Hor- 
ticultural Products, Cambridge, Massachusetts) and 
grown in growth chambers in the Washington Uni- 
versity Biology Department Plant Growth Facility 
(St. Louis, Missouri). Plants were harvested when 
they were two to three weeks old. Leaf tissue was 
ground to a powder in liquid nitrogen and stored 


at —80°C. Total genomic DNA was isolated from 
individuals using a modified CTAB procedure with 
a phenol-chloroform extraction (King & Schaal, 
1990). 

Genomic DNA from Brassica nigra was cut 
with Xba I and was ligated into Lambda Zap vector 
(Stratagene, San Diego, California) also cut with 
Xba I. The lambda library was amplified in NM522 
cells obtained from Stratagene. Lambda plaques 
were transferred from agar plates to nylon filters 
by the filter-lift procedure (Sambrook et al., 1989). 
Plaques of the heat shock gene were identified in 
the library using a cDNA clone of a small heat 
shock gene from Glycine max pCE53 (Schoffl & 
Key, 1982). The nylon (Magna nylon, MSI) filters 
were pre-hybridized and hybridized with hexamer- 
labeled pCE53 using standard protocols (King & 
Schaal, 1990) at moderate stringency: 2 x 15 
min. with 2 x SSC 0.1% SDS at 25°C and 2 x 
25 min. with 1 x SSC 0.1% SDS at 37°C (Sam- 
brook et al., 1989). Plaques were isolated and 
pBluescript plasmids were obtained from the plaques 
according to the Stratagene in-vivo excision pro- 
tocol. 

Plasmid DNA was obtained by alkaline lysis fol- 
lowed by phenol extraction (Sambrook et al., 1989) 
and digested with the following restriction enzymes: 
Acc I, Hind III, Sal 1, Pst I, Eco RI, Xba I, and 
Xho I (New England Biolabs). Reactions were con- 
ducted according to manufacturer’s recommen- 
dations. Restriction digests were electrophoresed 
in 0.85% agarose gels in tris-acetate buffer. The 
DNA was visualized under UV light by staining 
with ethidium bromide. The DNA was transferred 
to nylon filters using standard Southern blotting 
procedures. The coding region of the genes was 
identified by hybridization of the plasmid DNA di- 
gests with radio-labeled cDNA clone of a 17.6 kDa 
hsp from Arabidopsis thaliana (Helm & Vierling, 
1989). 


SURVEY OF GENETIC VARIABILITY 


Genomic DNA was isolated, using the CTAB 
extraction procedure described above, from each 
of 15 individuals of Brassica nigra (CrGC #2-1). 
Genomic DNAs were digested with Eco RI, and 
the restriction digests were electrophoresed in 0.8% 
agarose gels in tris-acetate buffer and transferred 
to nylon membrane using standard procedures. Ny- 
lon filters were hybridized with the hexamer-labeled 
small heat shock gene clone from B. nigra, de- 
scribed above. Filters were washed 2 x 15 min. 
with 2 x SSC 0.1% SDS at 65°C and 2 x 45 
min. with 0.1 x SSC 0.1% SDS at 65°C and 
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exposed to x-ray film with intensifying screens at 
—80°C for three to five days. 

To assess within-population restriction site vari- 
ation in the small heat shock genes, seeds of Bras- 
sica oleracea, Cardamine keyserri, and Rorippa 
schlechteri were collected in Papua New Guinea. 
Collections of seeds (at least 20 seeds from each 
of 10 individuals per population) were made in and 
around the village of Safalitikin, Urapmin, Sandaun 
Province. Voucher specimens were deposited in 
the herbarium at UPNG and at MO. 

Seeds were germinated and grown as described 
above. When plants were approximately four weeks 
old, leaves were harvested and stored at —80°C. 
Eight individuals from two populations of Rorippa 
schlechteri and eight individuals from one popu- 
lation each of Brassica oleracea and Cardamine 
keyserri were surveyed for within-population vari- 
ation. DNA from individual plants was digested with 
Ban I, Eco RI, Hpa 1, Hinf 1, Hind III, Rsa I, 
Pst I, and Tag I (New England Biolabs) according 
to manufacturer’s instructions. Restriction digests 
were electrophoresed in 0.8-1.2% agarose gels in 
tris-acetate buffer. DNA was transferred to nylon 
membranes and hybridized with the hexamer-la- 
beled small heat shock clone from B. nigra, de- 
scribed above. Filters were washed 2 x 15 min. 
with 2 x SSC 0.1% SDS at 65°C and 2 x 45 
min. with 0.1 x SSC 0.1% SDS at 65°C and 
exposed to x-ray film with intensifying screens at 


—80°C for three to five days. 


SURVEY OF THE BRASSICA TRIANGLE AND 
OTHER BRASSICACEAE 


Restriction sites in the small heat shock genes 
were surveyed among the following species: Ro- 
rippa schlechteri, Raphanus sativus (CrGC #7- 
1), Cardamine keyserri, Arabidopsis thaliana 
(CrGC #9-3), Brassica nigra (CrGC #2-1), B. 
oleracea (CrGC #3-1), B. rapa (CrGC #1-1), B. 
carinata (CrGC #6-1), B. juncea (CrGC #4-1), 
and B. napus (CrGC #5-1). The Brassica triangle 
is composed of the diploids B. nigra, B. oleraceae, 
and B. rapa and the amphidiploids B. carinata, 
B. juncea, and B. napus (Pakrash & Hinata, 1980). 

Genomic DNA from at least two individuals from 
each species was isolated and cut with Bam HI, 
Eco RI, Fok 1, Hind Ill, Hae III, Pst I, Rsa I, 
Sal I, and Xba I restriction enzymes. Digested 
DNAs were separated in 0.8% agarose gels, trans- 
ferred to nylon membranes, and hybridized with 
the hexamer-labeled small heat shock clone from 
Brassica nigra, described above. Filters were 
washed 2 X 15 min. with 2 x SSC 0.1% SDS at 
65°C and 2 x 45 min. with 0.1 x SSC 0.1% SDS 


at 65°C and exposed to x-ray film with intensifying 
screens at —80°C for three to five days. 

A portion of the gene encoding the small heat 
shock protein that is localized to the chloroplast 
was amplified from the genomic DNA of Arabi- 
dopsis thaliana, Cardamine keyserri, Brassica 
nigra, B. oleracea, B. napus, and B. juncea using 
Taq DNA polymerase, the manufacturer’s buffer 
(Boehringer Mannheim), and primers internal to 
the coding sequence of the A. thaliana gene (Os- 
teryoung et al., 1993). The primers were from 
base pairs 151-169 and 765-782 (using the start 
of the coding sequence as base pair 1). The DNA 
was amplified for 35 cycles with the following con- 
ditions: 94°C 1 min., 50°C 2 min., and 72°C 2 
min. Products were visualized after electrophoresis 
in a 0.9% agarose gel with ethidium bromide. 


SEQUENCE DATA ANALYSIS 


Gene sequences were obtained from GenBank 
(see Table 1). Sequences were compared using the 
Gap program in the Genetic Computer Group (GCG) 
computer package. Alignments were generated us- 
ing Pileup in GCG and were refined by hand using 
Lineup (see Appendix). The DNA sequences coding 
for the transit peptides were not used in these 
analyses. Synonymous and nonsynonymous sub- 
stitutions were calculated using the computer pro- 
gram of Li (1993). Phylogenetic analyses of the 
small heat shock gene sequences were done using 
PAUP (3.1.1) (Swofford, 1993). Heuristic searches 
were conducted using 100 random addition taxon 
replicates with MULPARS and TBR branch swap- 
ping. A bootstrap analysis with 100 replicates was 
performed to assess support for branches. 


RESULTS 


A full-length genomic clone encoding a cytosolic 
small heat shock protein was isolated. A restriction 
map of that clone is presented in Figure 1. The 
restriction enzyme Eco RI cleaves this clone twice. 
An Eco RI digest of genomic DNA of Brassica 
nigra probed with the clone of the small heat shock 
gene is presented in Figure 2. The sizes of the 
genomic Eco RI fragments in Figure 2 correspond 
to those predicted from the plasmid restriction map 
in Figure 1. The results of this Southern blot and 
others (data not shown) indicate that this gene is 
present in a single copy in B. nigra. 


POPULATION-LEVEL VARIATION 


To determine if there is variability in the small 
heat shock gene within populations, I conducted a 
survey in Brassica nigra, B. oleracea, Cardamine 
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FicurE 1. Restriction map of a Brassica nigra small heat shock gene. The shaded box represents the coding 
region. 


TABLE 1. Sources of sequences used in phylogenetic analyses of small heat shock genes. 


GenBank 
Species Protein accession Reference 
A. Genes for the chloroplast-localized proteins: 
Triticum aestivum Percival Hsp 26A X58280 Weng et al., 1991 
Triticum aestivum Hsp 26B X67328 unpublished 
Pisum sativum Poir. Hsp 21 X07187 Vierling et al., 1988 
Glycine max Merr. Hsp 22 X07188 Vierling et al., 1988 
Petunia hybrida Vilm. Hsp 21 X54103 Chen & Vierling, 1991 
Arabidopsis thaliana B. Heyne Hsp 21 X54102 Chen & Vierling, 1991 
B. Genes for endomembrane-localized proteins: 
Pisum sativum Hsp 22 M33898 Helm et al., 1993 
Glycine max Hsp 22 X63198 Helm et al., 1993 
C. Genes for the class I cytosolically-localized proteins: 
Zea mays Doebley & Iltis Hsp 17.2 X65725 unpublished 
Pisum sativum Hsp 18.1 M33899 Lauzon et al., 1990 
Arabidopsis thaliana Hsp 17.6 X16076 Goping et al., 1991 
D. Genes for the class II cytosolically-localized proteins: 
Zea mays Hsp 17.5 X54076 Goping et al., 1991 
Zea mays Hsp 17.8 X54075 Goping et al., 1991 
Triticum aestivum Hsp 17.3 X58279 Weng et al., 1991 
Pisum sativum Hsp 17.7 M33901 Lauzon et al., 1990 
Glycine max Hsp 17.9 X07159 Raschke et al., 1988 
Arabidopsis thaliana Hsp 17.6 X63443 Barling et al., 1992 
Pharbatis nil Roth Hsp 18.8 M99430 Krishna et al., 1992 
Pharbatis nil Hsp 17.2 M99429 Krishna et al., 1992 
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Ficure 2. Eco RI digest of Brassica nigra genomic 
DNA probed with the clone of the B. nigra small heat 


shock gene. 


keyserri, and Rorippa schlechteri. DNA from 15 
individuals of the CrGC rapidly cycling population 
of B. nigra was digested with Eco RI. This blot 
was probed with the full-length heat shock clone 
(containing both the coding and the flanking 
regions), and no polymorphisms were detected (data 
not shown). Populations of B. oleracea, R. schle- 
chteri, and C. keyserri were also surveyed. DNAs 
from eight individuals from each of these popula- 
tions were digested with eight restriction enzymes, 


and the blots were probed with the full-length probe. 
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Again, there was no detectable variation for the 
small heat shock gene in these populations (data 
not shown). 


VARIATION AMONG MEMBERS OF THE BRASSICACEAE 


Restriction site variation in the small heat shock 
gene was assessed among the members of the Bras- 
sica triangle and among other genera of the Bras- 
sicaceae, Arabidopsis thaliana, Cardamine key- 
serri, and Rorippa schlechteri. The relationships 
between the members of the Brassica triangle were 
established with cytological, morphological, and ge- 
netic evidence (Pakrash & Hinata, 1980), and 
analysis of the chloroplast genomes established the 
maternal and paternal parents of the amphidiploids 
(Erickson et al., 1983; Palmer et al., 1983). Bras- 
sica carinata has the chloroplast genome of B. 
nigra, and B. juncea has the chloroplast genome 
of B. rapa. Although the data for B. napus are 
equivocal, it appears that in at least some acces- 
sions B. rapa is the maternal parent. DNAs from 
each of these species were digested with five re- 
striction enzymes and probed with the full-length 
clone and with a clone containing only the coding 
region. No restriction site polymorphisms were found 
within the coding regions among members of the 
Brassica triangle studied, including Raphanus 
(data not shown). However, variation in restriction 
sites was detected within the coding region among 
genera in the Brassicaceae. When the Southern 
blots were hybridized with the full-length clone, 
restriction site variation was observed among the 
members of the Brassica triangle, indicating se- 
quence divergence in the flanking region (Fig. 3). 

Primers internal to the coding region were used 
to amplify the gene encoding the small heat shock 
protein localized to the chloroplast from six species 
of the Brassicaceae. A single band was present in 
Brassica nigra, B. oleraceae, Cardamine key- 
serri, Arabidopsis thaliana, B. juncea, and B. 
napus (Fig. 4). This indicates that this is a single- 
copy gene and that PCR (Polymerase Chain Re- 


action) amplification is easily accomplished. 


SEQUENCE ANALYSIS 


Sequences from different gene families were 
compared within Arabidopsis thaliana and within 
Pisum sativum (Table 2). In this analysis, the gene 
for the chloroplast-localized protein in Á. thaliana 
was compared to other genes from A. thaliana 
and the gene for the chloroplast protein in P. 
sativum was compared to other genes in P. sati- 
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burn; no interspecific comparisons were made. The 
gene sequences for the endomembrane proteins and 
both classes of the cytosolically localized protein 
are less than 50% similar to the gene for the protein 
localized to the chloroplast. In contrast, the genes 
for the chloroplast proteins are more than 50% 
similar to each other based on interspecific com- 
parisons (Table 3), even between monocots and 
dicots. Genes from each class or family are more 
similar to each other than they are to other small 
heat shock proteins in the same species. It is clear 
from these comparisons that there is no, or very 
little, gene conversion across gene families. 

Analysis of synonymous (Ks) and nonsynony- 
mous (Ka) substitutions (Table 4) indicates that the 
small heat shock genes are evolving at rates com- 
parable to that reported for other nuclear genes 
(Wolfe et al., 1989). In the comparison among the 
grasses, Ks is not saturated, and Ka is, as expected, 
much lower than Ks. In the comparisons among 
the dicots (Table 4), Ks is saturated, reflecting the 
more distant relationships of these species. The Ka 
calculated for the small heat shock genes is com- 
parable to the rates reported for the gene for chal- 
cone synthase (Wolfe et al., 1989). Rates of syn- 
onymous and nonsynonymous substitutions were 
determined for three of the gene classes for Ara- 
bidopsis thaliana and Glycine max. The complete 
sequence of the gene for the chloroplast-localized 
protein in G. max has not been isolated, precluding 
a comparison of the rate for the entire gene. The 
gene classes are all evolving at approximately the 
same rate (Table 4). 

In a phylogenetic analysis of the aligned gene 
sequences of the four major classes of proteins (the 
Class I and Class II cytosolically localized proteins, 
the endomembrane proteins, and the chloroplast 
proteins), the sequences of each gene family form 
well-supported clades (Fig. 5). Third positions of 
codons were omitted from the analysis because the 
rate analysis indicated that substitutions at these 
positions are saturated. A single tree of 936 steps 
with a consistency index (CI) of 0.640 was found 
in all the 100 heuristic searches. The order of gene 
duplications is not known, so the tree is arbitrarily 
rooted with the Class I sequences. A tree with the 
same topology was obtained using aligned amino 
acid sequences for the same genes (data not shown). 
Within each gene family, the gene phylogenies are 
roughly congruent with the species phylogenies. 
However, the taxon sampling is not the same in 
each gene family. There is weak support (i.e., boot- 
strap values less than 50%) for some of the branch- 
es within the Class II gene family. 


Hae III Digest 





FicurE 3. Hae III digest probed with the B. nigra 
small heat shock gene. Lane 1, B. nigra; lane 2, B. 
juncea; lanes 3 and 4, R. sativus; lanes 5 and 6, A. 
thaliana; lane 7, B. oleracea. 


DISCUSSION 


Phylogenetic analysis of molecular data sets has 
enabled evolutionary biologists to approach many 
questions not answerable with morphological data 
alone. In plant evolutionary studies, rDNA and 
cpDNA are the most frequently used molecular 
markers. Chloroplast DNA is useful in reconstruct- 
ing organismal relationships above the species level, 


TABLE 2. Similarity of the gene coding for the chlo- 
roplast-localized protein to other small heat shock proteins 
in Arabidopsis thaliana and Pisum sativum. 


A. P, 
thaliana sativum 
Class I 47.9% 44.5% 
Class II 45.0% 47.6% 
Endomembrane-localized 41.6% 
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FIGURE 4. 





Agarose gel stained with ethidium bromide. PCR products amplified using primers for the gene encoding 


the chloroplast protein. Lane 1, Brassica nigra; 2, B. napus; 3, R. schlechteri; 4, B. carinata; 5, B. oleracea; 
6, B. juncea; 7, A. thaliana; 8, C. keyserri. The A. thaliana PCR product is 590 bp. 


through both restriction site and sequence analysis, 
due to its low rate of evolution. However, because 
chloroplasts are typically uniparentally inherited, 
the usefulness of cpDNA data sets may be limited 
when there has been hybridization and introgres- 
sion. For example, some individuals of Helianthus 
annuus have the rDNA of one species and the 
cpDNA of another (Dorado et al., 1992; Rieseberg 
et al., 1990; Rieseberg & Brunsfeld, 1992). Phy- 
logenetic relationships inferred from a tree based 
on cpDNA can contradict those inferred from an 
rDNA tree. Patterns of cpDNA distribution can 
also reflect geographical relationships rather than 
species designations (Brunsfeld et al., 1992; Matos, 
1992; Soltis et al., 1992). Studies such as these 
make clear the need for additional nuclear markers 
for use in phylogenetic studies. 

Many researchers have avoided using nuclear 
markers because of the concern that many nuclear 
genes are members of multi-gene families and that 
gene conversion among these genes would obscure 
orthologous (due to speciation and reflecting or- 
ganismal phylogeny) and paralogous (due to gene 
duplication and reflecting gene phylogeny) rela- 
tionships. In a recent review, Doyle (1993) stated 


that “The nuclear genome would appear to have 
limitless potential for phylogenetic studies . . . yet 
it remains largely untapped as a source of DNA 
characters.” Gene duplication and conversion are 
widely believed to be important in the evolution of 
multi-gene families (Ohta, 1987, 1988). Frequent 
gene conversion among members of multi-gene 
families maintains sequence similarity but also re- 
stricts the independent evolution of individual genes 
and results in concerted evolution. Two forces will 
maintain high levels of similarity among gene se- 
quences: selection and gene conversion. In gene 
conversion, related gene sequences are homoge- 
nized within a genome. Using computer simulation, 
Sanderson & Doyle (1992) examined the difficulty 
in determining true species relationships using se- 
quence data from multi-gene families when there 
has been gene conversion and recombination. The 
results of this study suggest that the true species 
tree can be inferred when gene conversion is high 
(De, as in rDNA) and when gene conversion is low, 
but that intermediate levels of gene conversion will 
make it difficult to distinguish orthologous and par- 
alogous genes and will obscure species relation- 
ships. An analysis of sequences of the nuclear gene 
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TABLE 3. 

P. sativum G. max 

P. satiuum 
G. max 82.4% 
P. hybrida 65.7% 72.3% 
A. thaliana 63.1% 69.2% 
T. aestivum A 52.2% 59.6% 
T. aestivum B 52.6% 59.2% 


family encoding the small subunit of ribulose 1,5- 
bisphosphatase carboxylase (rbcS) in 17 genera, 
including green algae and cyanobacteria, demon- 
strated gene conversion among the rbcS genes 
(Meagher et al., 1989). However, Meagher’s anal- 
ysis also indicates that in spite of gene conversion, 
rbcS sequences can be used in some instances to 
infer generic and higher-level relationships. 
There is no evidence for widespread gene con- 
version across gene families of the small heat shock 
genes, although they appear to be related to each 
other through gene duplications. The small heat 
shock genes have a higher level of similarity within 
families across species than they do within species 
across families; this most likely reflects selective 
constraints. Similar results were reported in an 
analysis of the amino acid sequences of the small 
heat shock proteins (Vierling, 1991). Regions of 
high similarity within a genome are necessary for 
gene conversion to occur. Gene conversion would 
result in higher levels of similarity among genes 
across families, within a species. The small heat 
shock genes are evolving at a rate comparable to 
other nuclear genes and faster than genes encoded 
in the chloroplast (Wolfe et al., 1989). A com- 
parison of the nuclear Adh genes from Zea mays, 
Triticum aestivum, and Hordeum vulgare found 
a Ks of 0.66 and a Ka of 0.03 (Wolfe et al., 
1989). Between the Solanaceae and the Brassi- 
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Similarity of the nucleotide sequences of the genes for the chloroplast-localized proteins. 


P. hybrida A. thaliana T. aestivum A 
63.3% 
54.7% 56.2% 
54.1% 55.5% 96.5% 


caceae, the Ks was greater than 2.50, and Ka was 
0.10 (Wolfe et al., 1989). 

In a tree of representatives of all the small heat 
shock gene families, each gene family formed a 
well-supported clade (bootstrap values of 99- 
100%). However, there is a lack of support for 
some of the branches within the Class II cytosolic 
gene clade. In Pharbatis nil the 17.2 kDa protein 
is induced by changes in photoperiod and heat 
shock, and the 18.8 kDa protein is induced only 
by heat shock. Due to the lack of sequence data 
for the small heat shock genes in related species, 
it is unclear whether these two genes are the prod- 
ucts of a recent or more ancient duplication. Hence, 
it is not possible to determine the paralogous and 
orthologous relationships between these genes and 
the other Class II sequences. It is this type of 
uncertainty that can make inferring organismal 
relationships problematical and perhaps unreliable. 

The chloroplast-localized small heat shock genes 
may prove the most useful in phylogenetic analysis. 
These genes are present in a single copy in the 
Brassicaceae, and the gene tree reflects species 
relationships. In addition, these genes are longer 
than the other small heat shock genes and thus 
provide more characters. The sequences for the 
transit peptides were not used in these analyses 
due to the difficulty of aligning the sequences across 
distantly related taxa. However, the rapid evolution 











TABLE 4. Rates of synonymous (Ks) and nonsynonymous (Ka) substitutions among the small heat shock genes. 
Ks Ka 
Z. mays vs. T. aestivum 
Chloroplast-localized 0.44 (+ 0.13) 0.09 (+ 0.02) 
Class II Gene 0.36 (+ 0.09) 0.08 (+ 0.02) 
A. thaliana vs. P. hybrida 
Chloroplast-localized >2 0.16 (+ 0.02) 
A. thaliana vs. G. max 
Class I cytosolic SZ 0.21 (+ 0.02) 
Class II cytosolic >2 0.22 (+ 0.02) 
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FIGURE 5. Tree of the small heat shock genes. The numbers above the branches are the numbers of times out 
of 100 bootstrap replicates that the branch was present. The names of the genes encoding the Class I cystosolic 
proteins are followed by I, those encoding the Class II cytosolic proteins by II, those encoding the endomembrane- 
localized proteins by er, and those encoding the chloroplast-localized proteins by cp. 
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of these sequences may prove informative in close 
comparisons. 

For a gene to be useful in phylogenetic analysis 
the following four criteria need to be met. First, 
the gene should be present in a single copy within 
a genome. Second, there should be no variation 
below the species level but sufficient genetic vari- 
ation between the species or taxa of interest. Third, 
if the gene is a member of a multi-gene family, 
orthologous and paralogous genes should be easily 
distinguishable. Fourth, the gene should have 
enough sequence similarity across the taxa of in- 
terest for Southern blot hybridization and primer 
annealing to be possible and no or few introns 
(Friendlander et al., 1992). In this paper, I have 
presented evidence that the small heat shock genes 
may be useful for phylogenetic analysis of plants. 
Restriction Fragment Length Polymorphism (RFLP) 
analysis indicates that the small heat shock genes 
analyzed in the Brassicaceae are single copy and 
are not variable below the species level but are 
variable between genera. The gene sequences of 
the four families of genes are readily distinguishable 
based on overall similarity comparisons and phy- 
logenetic analysis. The sequences of the different 
gene families are sufficiently divergent (approxi- 
mately 50%) that cross-hybridization in Southern 
blot analysis is unlikely, while within-family se- 
quence conservation permits amplification of genes 
using PCR. However, not all the small heat shock 
genes may be useful. The orthologous and paral- 
ogous relationships among the genes for the Class 
II proteins may be complex. More sequences will 
be needed before this gene family can be fully 
evaluated for its utility in phylogenetic analysis. 

DNA sequences for protein-coding nuclear genes 
may be very useful in inferring phylogenetic re- 
lationships, but they should not be used without 
careful examination of their evolutionary dynam- 
ics, including gene duplication and conversion. Of 
the four families of the small heat shock gene 
families, the genes coding for the chloroplast-lo- 
calized protein are the most promising for phylo- 
genetic study: a more detailed study of their use- 
fulness in phylogenetic analysis is now in progress. 
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